van der Mark et al. BMC Pulmonary Medicine 201 2, 1 2:63 
http://www.biomedcentral.com/1471-2466/12/63 



Pulmonary Medicine 



RESEARCH ARTICLE Open Access 



A systematic review with attempted network 
meta-analysis of asthma therapy recommended 
for five to eighteen year olds in GINA steps three 
and four 

Lonneke B van der Mark 1 *, PH Edo Lyklema 1 , Ronald B Geskus 2 , Jacob Mohrs 1 , Patrick JE Bindels 3 , 
Wim MC van Aalderen 4 and Gerben ter Riet 1 



Abstract 

Background: The recommendations for the treatment of moderate persistent asthma in the Global Initiative for 
Asthma (GINA) guidelines for paediatric asthma are mainly based on scientific evidence extrapolated from studies in 
adults or on consensus. Furthermore, clinical decision-making would benefit from formal ranking of treatments in 
terms of effectiveness. 

Our objective is to assess all randomized trial-based evidence specifically pertaining to 5-18 year olds with 
moderate persistent asthma. Rank the different drug treatments of GINA guideline steps 3&4 in terms of 
effectiveness. 

Methods: Systematic review with network meta-analysis. After a comprehensive search in Central, Medline, Embase, 
CINAHL and the WHO search portal two reviewers selected RCTs performed in 4,129 children from 5-18 year old, 
with moderate persistent asthma comparing any GINA step 3&4 medication options. Further quality was assessed 
according the Cochrane Collaboration's tool and data-extracted included papers and built a network of the trials. 
Attempt at ranking treatments with formal statistical methods employing direct and indirect (e.g. through placebo) 
connections between all treatments. 

Results: 8,175 references were screened; 23 randomized trials (RCT), comparing head-to-head (n=17) or against 
placebo (n=10), met the inclusion criteria. Except for theophylline as add-on therapy in step 4, a closed network 
allowed all comparisons to be made, either directly or indirectly. Huge variation in, and incomplete reporting of, 
outcome measurements across RCTs precluded assessment of relative efficacies. 

Conclusion: Evidence-based ranking of effectiveness of drug treatments in GINA steps 3&4 is not possible yet. 
Existing initiatives for harmonization of outcome measurements in asthma trials need urgent implementation. 
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Background 

Clinical guidelines contain systematically developed state- 
ments to help practitioners make optimal healthcare deci- 
sions [1]. The Global Initiative for Asthma (GINA) 
guideline is a major step forward in achieving best possible 
asthma control [2]. The GINA guideline uses symptoms, 
exacerbations, airflow limitation, and lung function vari- 
ability to categorize asthma severity into intermittent, mild 
persistent, moderate persistent or severe persistent. GINA 
suggests that 5 to 18 year-olds, whose symptoms are in- 
sufficiently controlled after three months of treatment at a 
particular GINA step, move up a step (see Table 1). 

There is level A evidence (see glossary) on the effect- 
iveness of short acting fi 2 -agonists (SABA; step 1) and 
adding a low dose inhaled glucocorticosteroid (ICS) 
(step 2) in children with mild asthma [2]. However, 
although the level of evidence for GINA step 3&4 recom- 
mendations for children older than 5 years is deemed A 
to B, level A evidence to guide step-up therapy is lacking 
for this age group. Scrutiny of the randomized trials 
(RCTs) underlying the guideline, reveals that some are 
outdated, because children used daily oral prednisone 
(see for example [3,4]), or compare step 2 with step 3 
(see for example [5]). This leaves only five RCTs compar- 
ing treatments of step 3&4 for this age group [6-10]. 

Network meta-analysis (NMA), also known as indirect 
comparisons, exploits the mathematical property that 
(A - B) - (A - C) = A - B - A + C = C - B. It enables 
one to formally compare drugs B and C although these 
were never compared head-to-head [11-13]. NMA has 
major advantages over classic meta-analysis; it formally 
ranks treatment effects in case more than two treatments 
are involved; it circumvents the usual overrepresentation 
of drug comparisons to placebo, which may not always 
be the most informative for practising physicians [14]. 
We set out, using NMA methodology, to compare GINA 
step 3&4 drug treatment efficacies in 5 to 18 year-old 
children/adolescents with moderate persistent asthma. 

Methods 

Search strategy 

A trained clinical librarian performed a comprehensive 
literature search for relevant RCTs in the Cochrane 
Central Register of Controlled Trials (Central), Medline 
(Pubmed), Embase, CINAHL and ongoing trial registers 



registered on WHO Search Portal [15], published until 
4 February 2010 (For search details, see Additional file 1). 
In addition, two reviewers (LvdM, PhEL) scrutinized refer- 
ence lists of included articles, the GINA-guideline and 
relevant systematic reviews. 

Inclusion and exclusion criteria 

We included RCTs conducted in participants aged 5 to 
18 years with persistent-moderate asthma and compar- 
ing any GINA step 3&4 medication options (see 
Additional file 2) to each other or against placebo, with 
a follow-up duration of at least four weeks after start of 
the intervention. There were no language restrictions. 
Acceptable outcome measurements were: spirometry 
(forced expiratory volume in 1 second (FEVi)_ forced 
vital capacity (FVC), FEVi/FVC ratio, forced expiratory 
flow 25%-75% (FEF 25 _ 75 ), peak expiratory flow (PEF)), 
methacholine challenge test (PC 2 o-FEVi), fractional 
exhaled Nitric Oxide (FeNO), asthma symptom score, 
use of f^-agonists as breakthrough medication, and 
quality of life. 

If results did not pertain to the 5 to 18 years age cat- 
egory, the trial was excluded with one exception: RCTs 
including 4-year olds were included if mean or median 
age was between 5 to 18 years. Studies were excluded if 
they compared add-on medication to a non-standardised 
dose of ICS. Cross-over studies not reporting on treat- 
ment effects for each separate treatment period were 
also excluded since carry-over effects cannot be 
excluded and are extremely difficult to handle [16]. 

Selection 

Two reviewers (LvdM, PhEL) independently assessed 
titles and abstracts of all identified citations against the 
inclusion criteria. Any disagreements were resolved by 
consensus; in case of doubt references were included. 
LvdM and PhEL evaluated in full text all papers thus 
selected against the inclusion criteria. 

Data extraction 

LvdM and PhEL extracted, not in duplicate, data on au- 
thor, source and year of publication, language, study de- 
sign, interventions (medication, way of administration, 
dose, and frequency), population summary characteristics 
(me(di)an age, asthma severity) and study outcomes. If 



Table 1 GINA recommended treatment steps for 5 to 18 year olds 

Step 1 Step 2 Step 3 (Select one) Step 4 (Add one or more) 

SABA Low dose ICS A. Medium-or high dose ICS A. Medium-or high dose ICS + LABA 

B. Low dose ICS + LABA B. LIRA 

C. Low dose ICS + LIRA C. Theophylline 

D. Low dose ICS + Theophylline 



SABA = rapidly acting R 2 -agonists; ICS = Inhaled Corticosteroids; LABA = Long-acting ^-adrenoceptor agonists; LTRA = Leukotriene modifier. 



Overall score Y/N/? 



60 9 138 



Each trial could score "yes" for a low, "no" for a high and "?" for an uncertain risk of bias, respectively. 

The 9 items are based on a combination of the Cochrane approach to assess the risk of bias, combined with the validity checklist of Jadad et al. [1 7,1 8]. 
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possible, we calculated the mean change from baseline to 
endpoint for each trial arm. To facilitate meta-analysis, we 
contacted authors and sponsors of included studies for 
additional information such as outcomes expressed on 
other scales, (mean) patient characteristics such as height, 
and statistics such as standard errors if needed. We asked 
for separate data (summaries) of participants in the 5 to 
18 years range if the trial had combined this group with 
younger or older participants. 

Quality assessment 

Methodological quality of all included trials was assessed 
on 9 items [17,18] (see Table 2). The risk of bias scale 
was developed using the Cochrane Collaboration's tool 
for assessing risk of bias [17]. All items were scored as 
"yes" for low, "no" for high, and "?" for uncertain risk of 
bias, respectively. 

Statistical analysis 

Many network meta-analyses were based on dichotom- 
ous outcomes for each trial. In our study, outcomes were 
mostly continuous. To take lung function as an example, 
meta-analysis had been possible if, for each treatment 



arm, every publication had reported change in mean 
FEVi% pre d and its standard error after a suitable period 
of follow-up. Unfortunately, several trials only reported 
FEVi(l) therefore we did some efforts to salvage the 
problem by converting the FEVi(l)-value in FEVi% pre d- 
Ideally, we would have had access to individual patient 
data (IPD) for each trial in the review. In our case we 
simulated IPD using the summary statistics reported. 
We simulated 1000 virtual children from a general 
population with age, height and sex distribution based 
on the available data on mean age, height and sex per 
trial arm. Next, we calculated a corresponding FEVi(l)- 
value per virtual child, using existing formula's. In a final 
step, for each trial-arm, we tried to calculate a mean 
FEV 1 % pred and a corresponding SD from the simulated 
data, to be used for meta-analysis. (For details on the 
statistical analysis see Additional file 3) [37-41]. 

We also considered the use of Z-scores. However, the 
SD was frequently missing and not provided after re- 
quest. Furthermore, Z-scores can only be compared if 
the average of both outcomes (FEVi(l) and FEVi% pred ) 
differ by a multiplicative factor, equal to the quotient of 
the standard errors. Since there were no studies that 
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Figure 1 Flowchart from database searches to inclusion of the trials. 7,1 52 of the 8,1 75 references were excluded because they did not fulfil 
the inclusion criteria. Main reasons for exclusion were: reference was not a trial, wrong age group or no separate data for < 18 year olds, wrong 
dosage, not asthma, follow up duration < 4 weeks or cross-over design. 
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provided both FEVi(l) and FEVi% pre d and their respect- 
ive standard errors, we could not check whether this 
property was approximately correct in our data and 
refrained from using this method. 

Results 

Studies and patients 

The comprehensive literature search yielded 8,175 refer- 
ences (see Figure 1). We retrieved 200 as full text arti- 
cles, representing 160 unique studies. Reference tracking 
of the GINA guideline, systematic reviews and included 
references did not yield additional references. Twenty- 
three trials, conducted between 1984 and 2010, met the 
inclusion criteria and were included [6-10,19-36]. 
Additional file 4 shows the study characteristics of the 
23 trials with 4,129 patients ranging in age from 4 to 18 
years; we included 6 trials with a lower age range of 4 
years, but with a mean or median between 5 and 18 
years [6,9,29,31,32,36]. Figure 2 shows the network of 
direct and indirect comparisons. There are 28 theoretic- 
ally possible pair-wise comparisons: all 7 GINA options 
versus placebo and 21 head-to-head comparisons, 3A 
versus 3B, . . ., 3 A versus 4C, and, taking the other 



options as a starting point, all the way to 4B versus 4C. 
The arrows represent the ten actually published direct 
comparisons. The white boxes show the number of 
RCTs and total number of participants for each com- 
parison. In total, we found seven different head-to-head 
comparisons (with between 1 and 7 studies per compari- 
son) and 3 different comparisons with placebo (with be- 
tween 1 and 8 studies). All indirect comparisons were 
possible, except for comparisons with GINA 4c (medium 
dose ICS+theophylline as add-on to step 3), which is not 
connected to the network. An example of a possible in- 
direct comparison replacing a non-existent direct com- 
parison is step 3A versus 4A via 3B. A more complicated 
example is 3D versus 4A via 4B and 3B. An example 
where both direct and indirect comparisons exist would 
be 3B vs 3C, namely, direct via a N=63 trial, and indirect 
via N=955(899(3B vs Placebo)+56(Placebo vs 3Q) parti- 
cipants. The latter example illustrates how NMA may 
add strength to scarcely investigated direct comparisons. 
Figure 2 shows that 3A versus 3B (N=776), 3B versus 
placebo (N=899) and 3B versus 4A (N=1977) are rela- 
tively well researched, while most other comparisons de- 
pend on weak statistical evidence. However, there are 
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Figure 2 The network of included trials in GINA step 3&4. ICS = Inhaled Corticosteroids; LABA = Long-acting (52-adrenoceptor agonists; SR= 
Sustained release. The arrows represent the direct comparisons found in the included RCTs, including the number of RCTs and total number of 
participants. Except for SR theophylline as add-on to step 3, all treatments are directly or indirectly connected to each other. An example of an 
indirect comparison replacing a non-existent comparison is step 3A to 4A, through 3A to 3B, and 3B to 4A. 
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some comparisons that benefit from the relatively strong 
3B versus placebo connection, for example, 3A versus 
3B, and 3B versus 3C. 

The high number of question marks (138/207, or 67%) 
in Table 2 indicates that incomplete, unclear or non- 
reporting hampered thorough quality assessment. Eleven 
out of 23 trials reported on compliance, while only 4 
reported on blinding of the physician or how missing 
values were dealt with. 

Outcome measurements 

We found enormous variation in choice of outcome 
measures and how they were reported. Twenty-one 
studies reported FEVi, but variation in methods of 
reporting was quite extreme (Table 3). None of the stud- 
ies reported IPD. One study reported the method of 
converting "liters" to "percentage of predicted" (e.g. 
Quanjer, Zapletal, Polgar or Hankinson) [32]. Thus, al- 
though FEVi is an outcome that 21/23 studies reported 
in some form, the results could not be compared 
straightforwardly, nor pooled. Pooling outcomes on 
asthma symptoms, the second best, was also not possible 
(see Additional file 5). 

Attempts to salvage the situation 

FEV^values depend on sex, age, and height. FEVx-values 
are usually not normally distributed and extreme values 
occur, skewing the mean [31]. Besides differences in 
reporting of litres and percentages of predicted, the mix of 
outcome measures and statistical details was often 



reported unsystematically and awkwardly. Intra-arm dif- 
ferences instead of between-arm differences were often 
reported, while descriptive statistics (standard deviation, 
range) were used where inferential statistics (standard 
errors, confidence intervals) were needed. These inconsist- 
encies or mistakes thwarted our attempts at pooling of 
results and made a sensible summary difficult altogether. 
We contacted authors or sponsors for more details (e.g. 
summaries of patient characteristics for height, IPD, alter- 
native outcome measurements such as the mean differ- 
ence between the groups with corresponding standard 
errors, different time point of follow-up) to allow expres- 
sing the results on identical scales. Unfortunately, only in 
four instances we received additional information through 
these personal communications [7,8,22,34]. 

We used those trials that reported both FEVi(l) and 
FEVi% pre d to directly compare our simulation results 
with those empirically measured in these trials, and 
found that, regrettably, they were very different. In 
particular, the ranges of the results were much nar- 
rower than the empirically measured percentages of 
predicted. In some cases, results from our conversion 
method were opposite to the true results (simulated 
result of FEVi>100% of predicted versus an observed 
result of FEVi<100% of predicted) [6,30,31]. Because 
of these considerable and irresolvable discrepancies, 
we decided that formal meta-analysis seemed irrespon- 
sible. This decision was made easier by the deficient 
reporting and potentially low methodological quality 
of many trials. 



Table 3 Reporting method of FEVi for all included trials for each scale (liter or % of predicted) and statistical method 
of reporting, at baseline (T 0 ) and endpoint (T e ) 



Scale of 
FEV, 



Outcome summary 
measure 



Reference number of study 
reporting summary measure 
atT£ 



Reference number of study 
reporting summary measure 
atT£ 



Liter 


Mean 

Mean change 

Mean + SE or SD 

Mean change + SE or SD 

Mean + range 

P-value versus another arm 

Difference between arms + 95%CI 


8, 10, 11, 16, 18, 22, 23 
14, 15, 16, 19 


13, 15, 19 
10, 14, 18 

8, 16 

23 

16 

10, 14, 15, ' 
13, 23 


1 8, 1 9, 23 


% of predicted 


Mean 




13 






Mean change 




11, 12, 18 






Mean + SE or SD 


2, 3, 5, 6, 7, 9, 10, 11, 16, 17, 18, 20,21,22 


2, 3, 6, 7, 9, 


1 6, 1 7, 20 




Mean change + SE or SD 










Mean + range 


12, 13, 14, 15, 16, 17, 19 


16 





P-value versus another arm 
Difference between arms + 95%CI 



13, 



12, 18 
17 



a T 0 = baseline of the trial. 

= endpoint of the trial, varying from 4 to 56 weeks. 
Reporting method of all included trials for each scale (litre or % of predicted) and statistical method of reporting, at baseline (T 0 } and endpoint (T e ). 
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Descriptive findings of key trials 

Since we are not able to pool the data and establish an 
evidence based ranking of effectiveness of drug treat- 
ments in GINA steps 3 and 4, we describe the main 
findings from the trials of the most frequently compared 
interventions (>2 trials/comparison), including over 100 
patients per intervention group and having a follow up 
of at least 8 weeks. Many different outcomes are 
reported. However, in the trial descriptions below we re- 
strict our focus to the following clinically most relevant 
ones: number of exacerbations, level of control, reliever 
medication use, symptom score, frequency of night-time 
awakening, quality of life, FEVi, hyperresponsiveness 
(PC20-FEV1) and PEF. The only interventions compared 
more than two times were steps 3A versus 3B, that is, 
adding a LABA to a low dose of ICS or increasing the 
dose of ICS, and 3B versus 4A, that is, adding a LABA 
to medium or high dose ICS. 

Step 3A versus 3B 

Three trials [6,32,36] published between 2006 and 2009 
compared a medium or high dose ICS to low dose ICS 
plus LABA: Gappa et al. (age 4-16 years; n=138 and 145; 
QA-score=4/9), Bisgaard et al. (age 4-11 years; N=117, 
118 and 106; QA-score=2/9) and De Blic et al. (age 4-11 



years; N=150 and 153; QA-score=7/9). Bisgaard et al. in a 
3-armed trial, compared a fixed low dose of ICS plus 
LABA, a non-fixed low dose ('SMART') of ICS plus a 
LABA and a medium dose of ICS. The authors claim sig- 
nificant effects from the SMART 'regimen compared to 
medium ICS or fixed dose. But according to the GINA 
classification the two ICS plus LABA regimens are 'GINA 
3A and for the purpose of this review we see no additional 
value of comparing between these two GINA 3A arms. 
We excluded the results of the non-fixed-dose group from 
this discussion. Because participants in the non-fixed dose 
group were allowed to take additional study medication 
(ICS+LABA), only a mean number as-needed-use inhala- 
tions (daytime: 0.49 & nighttime: 0.09) in this group is 
reported. Regrettably, no range or standard deviation is 
mentioned. Therefore it is possible that some participants 
were in fact treated according to GINA 4A. 

As presented in Table 4, Gappa et al. as well as 
Bisgaard et al. found that adding LABA (3A) improved 
the level of control statistically significantly more than 
doubling the dose of ICS (3B). However, the trial by De 
Blic et al. was unable to confirm this. Gappa et al. and 
De Blic et al. found a statistically significantly lower use 
of rescue medication in the LABA group after 12 weeks 
compared to the ICS group, but Bisgaard et al. found no 



Table 4 Reported significant differences between intervention groups per trial 



Study 


3A-3B 






3B-4A 






Outcome 


Gappa [32] 
Differences 
between 
groups (95%CI) 


Bisgaard [6] 
Differences 
between 
groups (p) 


De Blic [36] 
Differences 
between 

groups (95%CI; p) 


Tal [29] 
Differences 
between 
groups (95%CI) 


Morice [30] 
Differences 
between 

groups (95%CI; p) 


Pohunek [31] 
Differences 
between 
groups (p) 


Number of 
exacerbations 


n.a. 


n.d. 


n.d. 


n.a. 


n.a. 


n.a. 


Level of control 


p=0.02 a 


9.8(0.047) b 


n.d. 


n.d. 


n.d. 


n.d. 


Reliever medication 
use/reliever 
free days c 


8.7(1.2-16.3) 


n.d. 


1 .4(0.0-3.4; 0.025) 


n.d. 


n.d. 


n.d. 


Symptom score 
(day&night) 


n.d. 


0.27(0.024) 


n.a. 


n.d. 


n.d. 


n.d. 


nighttime 
awakening 


n.a. 


n.d. 


n.d. 


n.d. 


n.d. 


n.d. 


Quality of life 


n.a. 


n.a. 


n.a. 


n.a. 


n.d. 


n.d. 


Lung function 
(FEV,) 


n.d. 


n.d. 


n.d. 


3.75(1. 1-6.4) d 


n.a. 


0.08(<0.01f & 
0.06(<0.001) e 


Hyperresponsiveness 
(PC 20 -FEV,) 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


morning PEF 


6.1 (1 .8-1 0.4) d 


n.d. 


7.6(1. 7-1 3.5) e 


3.77(1. 84-5.7) d 


9.5(4.2-14.9; <0.001) e 
& 10.3(5.0-15.6; <0.001) e ' f 


15(<0.001) e & 
6(<0.001) e ' f 



a nr of weeks with successful asthma control. 

b asthma-control days (%). 

c Percentage of days without rescue medication. 

d % of predicted. 

e liter. 

f For some studies 2 results are presented because three groups were compared in the trial, 
n.a. = not available. 

n.d. = no statistically significant differences. 
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difference. Only Bisgaard et al. found a significantly bet- 
ter improvement of the symptom score in the LABA 
group compared to the ICS group. 

Overall, these larger trials seem to support the view that 
there is a larger benefit from adding LABA to a low dose 
of ICS than from doubling the dose of ICS (See Table 4). 

Step 3b versus 4A 

Seven trials [8,10,21,29-31,35] published between 1995 
and 2007 compared a medium or high dose of ICS to a 
medium or high dose of ICS plus LABA. Three trials 
[29-31] contained more than 100 patients per group and 
had a follow up of more than 8 weeks: Tal et al. (age 4- 
17 years; N=138 and 148; QA-score=3/9), Pohunek et al. 
(age 4-11 years; N=213, 201 and 216; QA-score=l/9) 
and Morice et al. (age 6-11 years; N=212, 203 and 207; 
QA-score=3/9). None of the trials found statistically sig- 
nificant differences between the groups on number of 
exacerbations, level of control, use of rescue medication, 
symptoms scores, nighttime awakenings or quality of 
life. As presented in Table 4, Tal et al. and Pohunek 
et al. both found a statistically significantly larger benefit 
on FEVi in the LABA group after 12 weeks compared to 
the ICS group. All three trials found statistically signifi- 
cant differences in favour of LABA on morning PEF 
after 12 weeks. 

The three studies described here seem to support the idea 
that adding LABA to medium dose ICS is slightiy more ef- 
fective, although as measured by lung function only. 

Discussion 

We tried to synthesize the evidence for GINA step 3&4 
recommendations for 5 to 18 year-olds with moderate 
persistent asthma. Our aim was to rank the 21 different 
GINA treatment options as to their effectiveness using 
standard systematic review methods extended by net- 
work meta-analytic techniques. 

In principle, the situation looked favourable for network 
meta-analysis, with RCTs on six out of seven interventions 
either against placebo or head-to-head (Figure 2). Lack of 
direct comparisons, for example GINA 3C versus 4A, 
could have been compensated by indirect comparisons, 
for example through GINA 3B and placebo. Only theo- 
phylline was disconnected to the network of trials as we 
found no trials in this age group. 

Due to extremely different choices trialists made on out- 
come reporting methods, we had to abandon attempts at 
meta-analysis. Apart from embarking on a set of con- 
certed new trials in this area, which may take years to 
complete, a potentially quicker way to salvage the situ- 
ation with existing data may be joint action among spon- 
sors and trialists of existing trials to aggregate their raw 
data to inform an IPD meta-analysis [42,43]. The authors 
of this review would be more than happy to support such 



an endeavour, thereby achieving this review's original aim. 
Such an exercise would depend also on the results of add- 
itional trialist-provided information on trial quality, since 
pooling of very low quality data is unattractive. This 
brings us to the next point. We assessed the risk of bias in 
the included trials on a 9-item methodological quality 
checklist. We scored "?" if the risk of bias seemed hard to 
determine. We scored 138/207 "?", and this is largely due 
to partial, unclear or non-reporting (see Table 2). Adop- 
tion and enforcement of the CONSORT statement should 
become a priority for trialists and journals alike [44]. 

After criticizing some of the outcome reporting meth- 
ods, let us consider the strengths and limitations of our 
own work. We comprehensively searched the literature 
and tried to minimize the risk of missing RCTs by track- 
ing the references of the GINA-guideline, included RCTs 
and relevant systematic reviews [45-47]. However, these 
efforts yielded no additional relevant references. We per- 
formed all major steps, except the extraction of the 
quantitative data in duplicate. Furthermore, our team 
had expertise on all aspects of a systematic review: clin- 
ical librarian, biostatistician, physician-epidemiologist, 
two general practitioners, a trainee general practitioner, 
and a paediatric pulmonologist. Nevertheless, our review 
is no exception in that it may have been affected by sup- 
pression of negative trial results, or publication bias [48] . 

As far as we are aware, a network meta-analysis on 
this subject would have been novel. The majority of the 
meta-analyses performed on these treatment options are 
combined for paediatric and adult patients. In 2003, 
Bisgaard analyzed the effect of long-acting [3 2 -adrenoceptor 
agonists (LABA) on the asthma exacerbation rate in 
paediatric patients in a review of eight randomized 
trials [46]. All trials compared a LABA with a SABA or 
placebo in children on inhaled corticosteroids and 
reported on exacerbations or asthma-related hospitaliza- 
tions in asthmatic children. Bisgaard, while providing the 
spectrum of relative risks, refrained from formal meta- 
analysis, because of differences in patient populations, 
comparators, study design and duration, and definitions of 
asthma exacerbation. He concluded that there is no evi- 
dence in the existing paediatric literature that LABA pro- 
tects against asthma exacerbations, even when used as an 
add-on therapy to ICS. 

In line with our view that firm evidence to guide step- 
up therapy is lacking, Lemanske et al. performed the 
BADGER trial, a three-period-cross-over trial in children 
eligible for GINA step 3 [49]. The BADGER trial is 
clearly relevant to the topic of this review. The study 
addresses the research question which of the three medi- 
cation options (doubling the dose of the inhaled cortico- 
steroid, adding LABA or LTRA) should be the first 
choice of treatment in step 3 of the guidelines. Because 
of its importance to the research question of this review, 
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we will discuss it in some more detail. The BADGER 
investigators assigned 182 children, from 6 to 17 years 
of age with uncontrolled asthma, despite receiving a low 
dose ICS to receive each of three blinded step-up ther- 
apies, corresponding with GINA step 3A, 3B and 3C, in 
random order for a period of 16 weeks each. Several 
clinical and physical aspects were measured, including 
the need for oral prednisone, an asthma control test and 
FEVi. Main outcome was that overall, LABA as add-on 
(GINA 3A) performed better than increasing ICS dose 
(GINA 3B) or adding LTRA (GINA 3C). Furthermore, 
subgroup analyses were performed to predict the direc- 
tion of the patterns of differential response, primary on 
baseline values of PC201 Asthma Control Test scores and 
genotype, and, post hoc, on demographic and physiological 
characteristics. The only significant (p=0.009) predictor was 
the baseline Asthma Control Test scores (</>19) on the 
probability of the best response to LABA step-up. 

Strengths of the BADGER trial are the topical research 
questions and relevant outcomes measures. Further- 
more, sensitivity analyses were performed to assess bias, 
for example seasonal differences. However, the treatment 
period-specific results were not reported separately, 
which was the main reason why we could not use the 
trial in this review with network meta-analysis. In 
addition, the study is hampered by the cross-over design 
with possible carry-over effects of ICS treatment. A 
wash-out period of four weeks makes using the second 
and third treatment periods hazardous due to unquantifi- 
able carry-over effects [16]. Carry-over effect of ICS 
would have improved the treatment effects of adding 
LABA or LTRA. Furthermore, post hoc analysis with rela- 
tively small subgroups already raised much discussion 
and suggests hypotheses that need more research in stud- 
ies with a different design [50-54]. 

Although GINA provides us with treatment recommen- 
dations, steps 3&4 are still not based on sound evidence. 
For patients, their parents, and physicians alike, uncertainty 
about the best treatment remains. New trials should focus 
on add-on therapy to ICS in children. Ongoing and new 
RCTs will be part of meta-analysis in a few years. To inter- 
pret individual studies, consensus about design and report- 
ing of outcome measurements for RCTs would provide a 
much better evidence base for the future. In 2009 an official 
American Thoracic Society/European Respiratory Society 
statement, about standardizing endpoints for clinical 
asthma trials and clinical practice was published [55]. A 
taskforce formulated recommendations of assessment for 
the design, conduct and evaluation of asthma trials for clin- 
icians, researchers, and other relevant groups. These 
recommendations form an excellent starting point for 
harmonization of outcome measures and accompanying in- 
ferential statistical measures in RCTs and other compara- 
tive effectiveness research. As far back as 1992, Tugwell 



and Boers introduced a solution for Rheumatoid Arthritis 
Clinical Trials, OMERACT ("Outcome Measures in 
Rheumatoid Arthritis Clinical Trials") [56]. OMERACT, an 
international informal network, strives to improve outcome 
measurement through a data driven, iterative consensus 
process involving relevant stakeholder groups. This type of 
initiative would be welcome in asthma research too. 

Another solution may be prospective meta-analysis 
(PMA) [17,42]. PMA meta-analyses RCTs, preferably by 
using IPD, that were identified, evaluated and determined 
to be eligible for the meta-analysis before the results of 
any of those studies become known. PMA was developed 
to overcome some of the problems of normal (retrospect- 
ive) meta-analyses, mainly to enable hypotheses to be spe- 
cified a priori and ignorant of the results of individual 
trials. Ideally, PMA provides standardization of clinical 
trial procedures, such as study design and data collection 
methods, by using, for example, the same instruments and 
the same time points for measuring outcomes. 

Conclusion 

Due to extreme variation in choice of outcome measures 
and their reporting, firm evidence-based ranking of ef- 
fectiveness of the treatment options in GINA 3&4 for 5 
to 18 year-olds based on evidence from randomized 
trials is currently impossible. Implementation of the 
recommendations issued by the recent ATS/ERS task- 
force on measures of asthma control in RCTs is urgent. 
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Glossary of some terms 

Level A evidence: a substantial number of well designed RCTs exist, with 
substantial numbers of participants, in the recommended population, with 
consistent patterns of findings (1). 

Level B evidence: few RCTs exist; they are small in size, undertaken in a 
different population or results are not consistent (1). 
Carry-over effect: the persistence of a treatment applied in one period in a 
subsequent period of treatment (2). 
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